Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce expansion from type and contract generation. #633

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

samth
Copy link
Sponsor Member

@samth samth commented Oct 17, 2017

This reduces zo size in plot-gui-lib by about 11x.

@jackfirth
Copy link
Sponsor Contributor

This reduces zo size in plot-gui-lib by about 11x.

Whoa

@@ -43,3 +43,8 @@
;;
;; Also, this type works better with inference.
(-> (make-Prompt-Tagof Univ (-> Univ ManyUniv)))))))

(begin-for-syntax
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment w/ brief description?

@@ -78,6 +78,11 @@

(define-syntax (-#%module-begin stx)
(syntax-parse stx
[(mb e0 e ...)
#:when (eq? '#:no-add-mod (syntax-e #'e0))
#'(#%plain-module-begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment?

(dynamic-require (module-path-index-join '(submod "." #%type-decl) m)
#f))))

(provide add-mod! do-requires)
(define (adjust p)
(match p
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment? is this just sexp -> sexp? what is it adjusting and why?

[_ p]))

(define (->mp mpi submod)
(collapse-module-path-index (module-path-index-join `(submod "." ,submod) mpi)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signature?

@@ -399,8 +407,13 @@
(if (from-typed? typed-side)
(and/sc sc any-wrap/sc)
sc))
;(eprintf "predef: ~s ~s ~s\n" predef-contracts type typed-side)
(cached-match
Copy link
Member

@pnwamk pnwamk Oct 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

printfs -- fyi

@@ -996,6 +1010,9 @@
(define extflnonnegative? (lambda (x) (extfl>= x 0.0t0)))
(define extflnonpositive? (lambda (x) (extfl<= x 0.0t0))))

(require (submod "../static-contracts/instantiate.rkt" predefined-contracts))
;(hash-set! predef-contracts (cons Univ 'typed) (cons #'any/c 'flat))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this commented out hash-set!?

(define (function-contract? stx)
(syntax-case stx ()
[(arr . _)
(and (identifier? #'arr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really low importance, but I've been trying to avoid using arr because people so easily confuse "arrow", "arity", etc and conversations about them can be really confusing if taken literally when they are used incorrectly. maybe it doesn't matter here.

@@ -479,6 +479,7 @@
(define/with-syntax (new-defs ...) defs)
(define/with-syntax (new-export-defs ...) export-defs)
(define/with-syntax (new-provs ...) provs)
(do-contract-requires)
(values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it important that do-contract-requires appears here (and not earlier or later)? Does it just need to come before the get-contract-requires below? If so, would it be clearer to have a function that just does both in the right order? (i.e. does the work of do-contract-requires and then returns the result from get-contract-requires?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if not, maybe a comment about why this effectful procedure is called there?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, do-contract-requires is for side-effect before any contracts are generated, so it has to be earlier. I'll find the logical place to put it an add a comment. get-contract-requires doesn't depend on those side effects, and just needs to be called to put the output in the correct submodule.

(-Arrow doms rng))

(define (simple->values doms rng)
(->* doms (make-Values (map -result rng))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this identifier as "simple to values"... but its really more like "simple arrow with values" or something, right?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This just exists so that the expansion of type serialization is smaller.

Copy link
Member

@pnwamk pnwamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great -- I made a few comments about adding some prose and/or signatures to help future maintainers.

One question/comment -- I was surprised to see how much RAW looking code is now appearing in typed-racket-more (e.g. typed-racket-more/typed/racket/draw.rkt). I worry that we're exposing more implementation details outside of TR's implementation, and there should be some simpler API that we instead define and maintain. Are we really expecting maintainers of other libraries to write stuff like that in their adapter modules?

@rfindler
Copy link
Member

@samth can you say a little bit about high-level strategy here? I'm of the naive opinion that when one writes a module in typed racket that exports, say, identifiers f and g with types T S, then there could be another module made at that same point that contains, roughly, (provide (contract-out [f T] [g S])) and then at each reference to the typed module from an untyped context, there doesn't need to be any contracts generated at all.

I'm clearly missing something (possibly not so) subtle and I'm definitely not on the critical path, but I am curious how wrong my understanding is, so if you have the time to fill me in, that'd be great.

@samth
Copy link
Sponsor Member Author

samth commented Oct 17, 2017

@rfindler What you describe as your naive opinion is in fact the case currently, although it's a bit more complicated than that. Roughly, a module like this:

(module m typed/racket
  (provide f)
  (: f : Integer -> Integer)
  (define (f x) (+ x 5)))

currently expands to the following (with a few simplifications):

(module m typed/racket
  (provide (rename-out [f* f]))
  (define (f x) (+ x 5))
  (define-syntax (f* stx) (if typed-context? #'f #'contracted-f))
  (module* #%contract-defs #f 
    (define contracted-f (contract (-> exact-integer? exact-integer?) f 'typed 'untyped))
    (provide contracted-f))
  (add-mod! (#%variable-reference))
  (begin-for-syntax
    (module* #%type-decl #f 
      (hash-set! types-of-defined-things #'f (make-Function
                                              (list (make-BaseType 'Integer)) 
                                              (make-BaseType 'Integer)))))

Note that I've left out how the f* macro gets access to contracted-f, the answer involves some trickery. Also, contracted-f is defined using the same machinery that implements contract-out so that it gets the blame right etc.

One question you might have at this point is why all the submodules. First, the #%contract-defs submodule means that you don't need to load and execute the contracts if you don't need the contracted version of the f. Second, the #%type-decl submodule means that you don't need to construct the types and initialize the hash table if you don't need that (for example, when just running the module after it's been compiled).

Another question you might have is what does add-mod! do. Well, since we stashed the hash table initialization in the #%type-decl submodule, we need to actually do that initialization in order to be able to type check other modules that might refer to f and need to know f's type. To do that, add-mod! registers the name of this module in a list which is created for the type checking of that other module. Then the type checker for the other module iterates over the list and dynamic-requires every module in it (or really the #%type-decl submodule), running the hash-set! calls and populating the environment. If the module requiring m isn't typed, then no one looks at the list, and those modules aren't required.

So, where does duplication come in? Well, imagine that you had two modules, both of which have an (-> Integer Integer) function provided from them. The #%contract-defs modules for each of them will have basically the same (-> exact-integer? exact-integer?) contract, which is duplicated work, and if that actual contract was very large, we'd have enormous compiled files, which is roughly what currently happens.

That's the background, and in the next comment I'll write about how this PR changes things to reduce duplication.

@rfindler
Copy link
Member

I see I was not clear in my writing. From what I can tell, the compiled version of this program:

#lang typed/racket
(require typed/racket/gui)
(provide f)
(: f (-> (Instance Frame%)))
(define f
  (λ () (new frame% [label ""])))

contains an entire copy of the Frame% contract, instead of a reference to that contract.

(I will note that this is not like the duplication you discuss; I didn't write out the Frame% type in this file!)

@rfindler
Copy link
Member

(PS: it also seems to contain copies of many other contracts; I think I see the text% contract and the dc<%> contract in there ... any maybe more?)

@samth
Copy link
Sponsor Member Author

samth commented Oct 17, 2017

Ok, now that you've read the background, here's the high-level strategy:

  1. Maintain a table (at expansion time) mapping types to identifiers which are bound to the contract for that type.
  2. Use that table to generate just a reference to that identifier instead of the actual implementation of a contract when needed.
  3. Update that table in roughly the same manner that the type environment is currently updated in the #%type-decl submodules (note that this technique was originally invented by @mflatt in the You Want It When paper).

Oh, and we're going to do the same thing for types as well, since the serialization of types can be large.

With that said, here's the outline of the new expansion (note that this isn't fully implemented yet):

(module m typed/racket
  (provide (rename-out [f* f]))
  (define (f x) (+ x 5))
  (define-syntax (f* stx) (if typed-context? #'f #'contracted-f))
  (add-mod! (#%variable-reference))
  (module* #%contract-defs #f 
    (define C (-> exact-integer? exact-integer?)) ;; new def
    (define contracted-f (contract C f 'typed 'untyped))
    (provide contracted-f))
  (begin-for-syntax
     (module* #%contract-defs-names #f
        (require (for-template (submod ".." #%contract-defs))
                      (submod ".." #%type-decl))
        (hash-set! predefined-contracts T #'C))
  (begin-for-syntax
    (module* #%type-decl #f 
      (define T (make-Function (list (make-BaseType 'Integer)) (make-BaseType 'Integer))) ;; new def
      (provide T)
      (hash-set! predefined-types T #'T)
      (hash-set! types-of-defined-things #'f T)))

Several things have changed here. First, we've added definitions of C and T in the submodules that use them. That's helpful for being able to refer to them later (note that defining C this way would make the contract less optimized, so we wouldn't do that in practice). Second, we've got a new submodule, #%contract-defs-names. Third, we are mutating two new tables, predefined-types and predefined-contracts.

The new mutations set up two new tables, which are mappings that tell us that if we want an expression that evaluates to the type T, we can use #'T, and that if we want an expression that evaluates to a contract for T, we can use #'C. We have to have the update of predefined-contracts happen in a new separate submodule because types such as T, and the generation of syntax for contract expressions such as #'C, happens at phase 1, but the actual contract value C is at phase 0. We don't need this split for #%type-decl because types and expressions that produce types both happen at phase 1.

Given that, what can we do now? Consider this module:

(module user typed/racket
   (require 'm) (provide g)
   (: g : Integer -> Integer)
   (define (g x) (f (f x)))

Now, when we go to generate a contract for g, we can look in our predefined-contracts table, see that we have a match, and just use #'C for the contract, instead of generating the syntax for g's contract. Similarly, we have to put an expression that constructs g's type in the expansion of user, but we can just use T for that. So now we've avoided lots of duplication.

However, there's one more issue. Lots of Typed Racket files don't depend on any other modules written in typed/racket, but they do depend on core Racket libraries to which Typed Racket has ascribed a type. If those types are big, then everyone who uses them directly will have their own duplicated copy, even if any individual path through the dependency tree would have only one copy, even in the scheme I've outlined.

We've already faced this with type serialization, and the solution is to initialize the table with some commonly-used types so there's no duplication at all. That's what's going on in the manually-constructed #%contract-defs and #%contract-defs-names submodules in this pull request, in the base.rkt and draw.rkt files. We generate those contracts once, even though that file doesn't use them, and register them in the predefined-contracts table, so that everyone who tries to generate that contract can share the definition.

@rfindler
Copy link
Member

rfindler commented Oct 17, 2017 via email

@samth
Copy link
Sponsor Member Author

samth commented Oct 17, 2017

Right, this doesn't use yet another table, which implements the meaning of define-type. Currently, the strategy for when to create a definition for a contract (as opposed to just inlining it) is that it's done whenever there's not a loss to doing it from the contract system's optimizations for functions. I'll probably change that to do it only when the contract would be generated multiple times. But more generally, we will create a lot more contract definitions than we will type names in the sense of define-type, and we want to avoid duplicating them as well.

Also, there's another wrinkle, which is that for classes, we have type names for the class types, but most of the contracts refer to the instances, and sadly those don't share a contract.

Note that the part where it generates the contents of the #%contract-defs-names submodule is not yet implemented, but the rest of it, including several predefined contracts, is implemented. Thus your example module now generates a contract that looks like (->* () n1435) where n1435 is defined in (submod typed/racket/gui/base #%contract-defs).

@rfindler
Copy link
Member

Just a point of clarification: when you write "....and sadly those don't share a contract", you mean that there isn't a type name? Surely if one had a contract that corresponds to the type Frame%, then one could just write (instanceof/c Frame/c) and avoid duplication.

Regardless, that n1435 is very exciting! (Although I must confess that Frame/c would be slightly more exciting. Perhaps for a future refinement 😄 )

Copy link
Contributor

@bennn bennn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is in-progress right?)

  • Is #:no-add-mod being used because #lang typed/racket doesn't include typed/racket/gui/base etc.?
  • Are many other modules going to need to change like the ones here in typed-racket-more? If they do need changes, would it be possible to just change typed-racket/base-env/extra-env-lang to those "user" files can look prettier?

[(app (lambda (t) (hash-ref predef-contracts (cons t typed-side) #f))
(? values con-id))
;(eprintf "found a match ~s ~s\n" con-id type)
(impersonator/sc (syntax-local-introduce con-id))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of always making an impersonator/sc, will this eventually check the kind and make a flat or chaperone when possible?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@samth
Copy link
Sponsor Member Author

samth commented Oct 18, 2017

#%no-add-mod is because I forgot at that point about the trick with #%plain-module-begin.

Yes, if other modules use extra-env-lang, they'll also have to grow these submodules. Probably we should add them by default and have a way to not add them for the ones that are needed.

@samth
Copy link
Sponsor Member Author

samth commented Oct 18, 2017

@rfindler Unfortunately, it really can't always reuse class contracts to produce instance contracts. There are a few reasons for this. One is that when an instance is provided from Typed Racket, we need to use an opaque object contract; that's what happens in this program:

#lang typed/racket
(define c (class object%
            (define/public (m x) x)
            (super-new)))

(define d (new c))
(provide c d)

A second reason is that there's not just one contract for a given type, there's both the contract when the value comes from untyped code and the one when the value goes to untyped code. (There's also a third one when the same contract has to handle both.)

Fortunately, this isn't really a source of much duplication, because most modules have instances in their interfaces, rather than classes, and because the individual method contracts can be shared.

@samth
Copy link
Sponsor Member Author

samth commented Oct 18, 2017

I've improved some of the bigger problems with this PR, and it's become clear that implementing the remaining big piece will be harder than I thought, so I'm considering moving forward with just this part first, especially since it's already a big win. I still plan to address the outstanding comments, of course.

@mflatt
Copy link
Member

mflatt commented Oct 18, 2017

That sounds good to me. I'm interested to see the build plot (http://build-plot.racket-lang.org/) after this change, but I haven't gotten around to adding a way to make a build plot using your "reduce-expansion" branch instead of the one that pkgs.racket-lang.org reports.

@samth
Copy link
Sponsor Member Author

samth commented May 22, 2018

I've fixed the use of gensym, and now the zo file sizes in plot seem similar, but I haven't done a full comparison. @pnwamk can you redo the comparison you did earlier in this discussion?

@pnwamk
Copy link
Member

pnwamk commented May 22, 2018

It looks at a glance like library sizes (at least math and plot) have slightly increased:

Nightly -> PR
-------------------
9.2M -> 10M (math)
8.0M -> 8.5 (plot)

@pnwamk
Copy link
Member

pnwamk commented May 22, 2018

The changes in plot-gui-lib might be more interesting:

Nightly -> PR
-------------------------
9.2M -> 10M (math)
8.0M -> 8.5 (plot)
328K -> 444K (plot-gui-lib)

Here's a spreadsheet with details from a particular directory in math and for plot-gui-lib:
reduce-expansion-comparison.pdf

@pnwamk
Copy link
Member

pnwamk commented May 22, 2018

Two files from plot-gui-lib:

Here's a diff from lazy-snip-typed_rkt.zo, a file that goes from 3.8K to 8k after this PR: https://gist.github.com/pnwamk/716badac9a7c9f21b43790df413e3a01

Here's a diff from plot2d_rkt.zo, a file that goes from 36K to 54K after this PR: https://gist.github.com/pnwamk/a2ffb48dbef3c1c19415fcc8e43226d5

@samth
Copy link
Sponsor Member Author

samth commented May 22, 2018

These diffs indicate two problems:

  1. eagerly requiring every module that could contribute a type or contract to the expansion generates too many requires to be tenable. I need to fix that which will reduce much of the extra code in plot2d.rkt.
  2. There's a bunch of syntax object serialization in lazy-snip-typed_rkt.zo that I don't understand. As far as I can tell, it generates two submodules that provide nothing and have a body of (void), but have a big syntax object that's not referenced anywhere. @mflatt, is there something obvious I should look for that would cause that sort of thing?

@samth
Copy link
Sponsor Member Author

samth commented May 22, 2018

Ok, another problem. Starting DrRacket with this change causes the following error (some of the module language code is in Typed Racket). What does this error indicate, and what am I doing wrong?

instance-variable-value: instance variable not found
  instance: 'empty-stx/empty-ns
  name: .deserialize-syntax
  context...:
   temp35_0
   for-loop
   [repeats 1 more time]
   do-attach-module17
   /home/samth/sw/plt/extra-pkgs/drracket/drracket/drracket/private/insulated-read-language.rkt:235:0: make-irl
   /home/samth/sw/plt/racket/collects/racket/contract/private/arrow-higher-order.rkt:361:33
   /home/samth/sw/plt/extra-pkgs/drracket/drracket/drracket/private/module-language-tools.rkt:206:4
   /home/samth/sw/plt/racket/collects/racket/private/class-internal.rkt:3554:0: continue-make-object
   [repeats 4 more times]
   /home/samth/sw/plt/extra-pkgs/drracket/drracket/gui-debugger/debug-tool.rkt:177:6
   /home/samth/sw/plt/racket/collects/racket/private/class-internal.rkt:3554:0: continue-make-object
   [repeats 5 more times]
   /home/samth/sw/plt/racket/collects/racket/private/class-internal.rkt:3508:0: do-make-object
   /home/samth/sw/plt/extra-pkgs/drracket/drracket/drracket/private/unit.rkt:1402:4
   /home/samth/sw/plt/racket/collects/racket/private/class-internal.rkt:3554:0: continue-make-object
   /home/samth/sw/plt/extra-pkgs/drracket/drracket/drracket/private/module-language.rkt:1578:4

@mflatt
Copy link
Member

mflatt commented May 22, 2018

The serialization in lazy-snip-typed_rkt.zo can be reduced by adding (#%declare #:empty-namespace) to the submodules on line 92-93 in "extra-env-lang.rkt".

@mflatt
Copy link
Member

mflatt commented May 22, 2018

I'll have to investigate the instance-variable-value problem. At some level, the module-to-linklet compiler decided that a linklet would not try to access any syntax objects – but something is trying to do that at after all.

@mflatt
Copy link
Member

mflatt commented May 22, 2018

Commit racket/racket@ab7dffa fixes the instance-variable-value problem for starting DrRacket.

samth added 13 commits May 22, 2018 16:23
* Mark all multiply-referenced types as popular, but avoid
  extraneous popular types cause by them being contained in
  other popular types. Thanks to @mflatt for help with this.

* Create some more simple constructors to use in generated code.

* Define some popular types as pre-defined.

Together, this reduces the size of framework-types.rkt's zo file
by about 100k.
This also sets up the infrastructure for sharing contracts between
modules that use TR, but doesn't use that infrastructure automatically
yet, just for explicitly predefined types.

Reduce .zo size for plot-gui-lib by about a factor of 11.
Since GUI types are rarely used in practice on typed/untyped boundaries
(they were mostly removed from plot) it's not clear that adding all
of them to the zo size of "typed-racket-more" is the right thing. This
makes the current change just set up the infrastructure.
@samth
Copy link
Sponsor Member Author

samth commented May 22, 2018

After those changes (thanks @mflatt) lazy-snip-typed_rkt.zo is now 5.4k. Next step is to fix the excessive requires.

@pnwamk
Copy link
Member

pnwamk commented May 22, 2018 via email

@samth
Copy link
Sponsor Member Author

samth commented May 22, 2018 via email

@sorawee sorawee removed this from the 6.12 milestone Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants